Do Computers Lie?

We have been constantly told this statement “Computers don’t lie”. Yes in fact Computers don’t lie, but neither does it speak the truth. A computer does what its Master programs it to do. Similarly, A model wouldn’t lie unless the Machine Learning Engineer doesn’t want it to lie.

Machine Bias

There was a nice episode of the podcast You are not so smart came out last year. This is an excerpt from it:

“I want a machine-learning algorithm to learn what tumors looked like in the past, and I want it to become biased toward selecting those kind of tumors in the future,” explains philosopher Shannon Vallor at Santa Clara University. “But I don’t want a machine-learning algorithm to learn what successful engineers and doctors looked like in the past and then become biased toward selecting those kinds of people when sorting and ranking resumes.”

The Problem

Machine Bias can occur due to a lot of factors but a few to name is:

Below is an example of how Google Translate, when translated the following text to a Gender-neutral langauge and back to English - applies its bias (primarily due to the nature of biased Training Dataset)

img

img

The Solution

The first step of finding solution to any problem is accepting The Problem exists. Let’s accept that fact and see how to use Kaggle Survey results and help the community tackle Machine Bias.

Libraries

Ignorance is Bliss - but not always!

The above plot is to demonstrate how much these questions that are about Model Fairness / Bias, have been ignored.

While asking about Salary made 15% of respondents to not answer, Questions about Reproducibility, Explainability and Bias made 37% of respondents to skip answering. The salary question comparsion is here to show relatively worse questions like this are approached.

Reproducibility, Explainability and Bias

To get a better perspective of the volume of the respondents, below is the same plot as above but with absolute numbers of respondents and their options.

Fairness and Bias:

key No opinion; I do not know Not at all important Slightly important Very important
Being able to explain ML model outputs and/or predictions 2.9% 1.6% 17.0% 41.1% 37.4%
Fairness and bias in ML algorithms: 5.4% 2.3% 19.0% 36.0% 37.4%
Reproducibility in data science 3.8% 1.0% 14.9% 42.9% 37.4%

Model Bias & Model Fairness

Gender

they %>% group_by(`What is your gender? - Selected Choice`) %>% count() %>% ungroup() %>% 
  rename("Gender" = `What is your gender? - Selected Choice`) %>% 
  mutate(n = n / sum(n),
         perc = percent(n)) %>% 
   ggplot() + geom_col(aes(Gender,n, fill = Gender), stat = "identity", show.legend = FALSE) +
   geom_label(aes(x = Gender, y = n - 0.05, label = percent(n)),
           # hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
    scale_fill_viridis(discrete = T, option = "E") +

  scale_y_continuous(labels = percent_format()) +
  theme_minimal() + 
  theme(axis.text = element_text(angle = 45, size = 6)) +
  labs(title = "They",
       subtitle = "Perception on Fairness and Model Bias ",
       x = "Gender",
       y = "Percentage of Respondents (other than NAs)") -> p1



not_they %>% group_by(`What is your gender? - Selected Choice`) %>% count() %>% ungroup() %>% 
  rename("Gender" = `What is your gender? - Selected Choice`) %>% 
  mutate(n = n / sum(n),
         perc = percent(n)) %>% 
    ggplot() + geom_col(aes(Gender,n, fill = Gender), stat = "identity", show.legend = FALSE) +
   geom_label(aes(x = Gender, y = n - 0.05, label = percent(n)),
           # hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
    scale_fill_viridis(discrete = T, option = "E") +

  scale_y_continuous(labels = percent_format()) +
  theme_minimal() + 
  theme(axis.text = element_text(angle = 45, size = 6)) +
  labs(title = "Not They",
       subtitle = "Perception on Fairness and Model Bias ",
       x = "Gender",
       y = "Percentage of Respondents (other than NAs)") -> p2


cowplot::plot_grid(p1,p2)

Let us a create a new KPI index called They - Not They Ratio to give a different perspective to this comparison.

  • There is a difference of 5.1 PP Female Percentage difference between those who perceive Model Fariness & Bias in ML is Very Important and Others.

  • While this could be seen as that Female Gender usually gets affected by these Biases, It’s also important to realize that Male Gender (Kaggler’s) don’t echo similar sentiment as their female counterpart. After all, A healthy model is what we all want, don’t we?

  • Using the index that we created T_NT_Ratio, we can see that Female gender and those who selected other options other than Male are above on top of Male Kagglers in their perception about Model Bias and Fairness

Age

Age doesn’t seem to give anything straightway, which probably could be due to a lot of different age brackets. Let us try a bit of engineering to club them into two groups < 30 and > 30.

they %>% 
  mutate(age_grp = ifelse(parse_number(`What is your age (# years)?`) < 30,
                          "Less than 30",
                          "30+")) %>% 
  group_by(age_grp) %>% count() %>% ungroup() %>% 
  rename("Age" = age_grp) %>% 
  mutate(n = n / sum(n),
         perc = percent(n)) %>% 
   ggplot() + geom_col(aes(Age,n, fill = Age), stat = "identity", show.legend = FALSE) +
   geom_label(aes(x = Age, y = n - 0.05, label = percent(n)),
           # hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
    scale_fill_viridis(discrete = T, option = "E") +
  scale_y_continuous(labels = percent_format()) +
  theme_minimal() + 
  theme(axis.text = element_text(angle = 45, size = 6)) +
  labs(title = "They",
       subtitle = "Perception on Fairness and Model Bias ",
       x = "Age",
       y = "Percentage of Respondents (other than NAs)") -> p1



not_they %>% 
   mutate(age_grp = ifelse(parse_number(`What is your age (# years)?`) < 30,
                          "Less than 30",
                          "30+")) %>% 
  group_by(age_grp) %>% count() %>% ungroup() %>% 
  rename("Age" = age_grp) %>% 
  mutate(n = n / sum(n),
         perc = percent(n)) %>% 
    ggplot() + geom_col(aes(Age,n, fill = Age), stat = "identity", show.legend = FALSE) +
   geom_label(aes(x = Age, y = n - 0.05, label = percent(n)),
           # hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
    scale_fill_viridis(discrete = T, option = "E") +
  scale_y_continuous(labels = percent_format()) +
  theme_minimal() + 
  theme(axis.text = element_text(angle = 45, size = 6)) +
  labs(title = "Not They",
       subtitle = "Perception on Fairness and Model Bias ",
       x = "Age",
       y = "Percentage of Respondents (other than NAs)") -> p2


cowplot::plot_grid(p1,p2)

This plot helps us say that the younger ones need to be updated with the implications of Model Bias and Fairness more than their older counterparts. That leads us to another important section of what they do.

Student vs Professionals

they %>% 
  mutate(title = ifelse(`Select the title most similar to your current role (or most recent title if retired): - Selected Choice` == "Student",
                          "Student",
                          "Professional")) %>% 
  group_by(title) %>% count() %>% ungroup() %>% 
  rename("Title" = title) %>% 
  mutate(n = n / sum(n),
         perc = percent(n)) %>% 
   ggplot() + geom_col(aes(Title,n, fill = Title), stat = "identity", show.legend = FALSE) +
   geom_label(aes(x = Title, y = n - 0.05, label = percent(n)),
           # hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
    scale_fill_viridis(discrete = T, option = "C") +
  scale_y_continuous(labels = percent_format()) +
  theme_minimal() + 
  theme(axis.text = element_text(angle = 45, size = 6)) +
  labs(title = "They",
       subtitle = "Perception on Fairness and Model Bias ",
       x = "Title",
       y = "Percentage of Respondents (other than NAs)") -> p1


not_they %>% 
  mutate(title = ifelse(`Select the title most similar to your current role (or most recent title if retired): - Selected Choice` == "Student",
                          "Student",
                          "Professional")) %>% 
  group_by(title) %>% count() %>% ungroup() %>% 
  rename("Title" = title) %>% 
  mutate(n = n / sum(n),
         perc = percent(n)) %>% 
    ggplot() + geom_col(aes(Title,n, fill = Title), stat = "identity", show.legend = FALSE) +
   geom_label(aes(x = Title, y = n - 0.05, label = percent(n)),
           # hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
  scale_fill_viridis(discrete = T, option = "C") +
  scale_y_continuous(labels = percent_format()) +
  theme_minimal() + 
  theme(axis.text = element_text(angle = 45, size = 6)) +
  labs(title = "Not They",
       subtitle = "Perception on Fairness and Model Bias ",
       x = "Title",
       y = "Percentage of Respondents (other than NAs)") -> p2


cowplot::plot_grid(p1,p2)

  • As with the previouus insight, once again it comes that Students need to be educated with the concepts of Model Bias and Fairness since there is ~4 PP difference between Students in They vs Not They.

Undergraduate Majors

they %>% 
  mutate(UG = ifelse(`Which best describes your undergraduate major? - Selected Choice` %in% c("Computer science (software engineering, etc.)","Information technology, networking, or system administration"),
                          "CS",
                          "Non_CS")) %>% 
  group_by(UG) %>% count() %>% ungroup() %>% 
  rename("UG" = UG) %>% 
  mutate(n = n / sum(n),
         perc = percent(n)) %>% 
   ggplot() + geom_col(aes(UG,n, fill = UG), stat = "identity", show.legend = FALSE) +
   geom_label(aes(x = UG, y = n - 0.05, label = percent(n)),
           # hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
  scale_fill_viridis(discrete = T, option = "D") +
  scale_y_continuous(labels = percent_format()) +
  theme_minimal() + 
  theme(axis.text = element_text(angle = 45, size = 6)) +
  labs(title = "They",
       subtitle = "Perception on Fairness and Model Bias ",
       x = "Title",
       y = "Percentage of Respondents (other than NAs)") -> p1


not_they %>% 
   mutate(UG = ifelse(`Which best describes your undergraduate major? - Selected Choice` %in% c("Computer science (software engineering, etc.)","Information technology, networking, or system administration"),
                          "CS",
                          "Non_CS")) %>% 
  group_by(UG) %>% count() %>% ungroup() %>% 
  rename("UG" = UG) %>% 
  mutate(n = n / sum(n),
         perc = percent(n)) %>% 
   ggplot() + geom_col(aes(UG,n, fill = UG), stat = "identity", show.legend = FALSE) +
   geom_label(aes(x = UG, y = n - 0.05, label = percent(n)),
           # hjust=0, vjust=0, size = 4, colour = 'black',
            fontface = 'bold') +
  scale_fill_viridis(discrete = T, option = "D") +
  scale_y_continuous(labels = percent_format()) +
  theme_minimal() + 
  theme(axis.text = element_text(angle = 45, size = 6)) +
  labs(title = "Not They",
       subtitle = "Perception on Fairness and Model Bias ",
       x = "Title",
       y = "Percentage of Respondents (other than NAs)") -> p2


cowplot::plot_grid(p1,p2)

  • Comptuer Science / IT Engineers have a difference of ~ 3.4PP between They and Not They, which shows that’s needs more care in the mindset change than Non-CS (even though that’s also required) but at least this can help prioritise where to start with any campaigning

R vs Python vs more

  • R users tend to be perceive Model Bias and Fairness as Very Important than their Python Counterparts.

All Countries

Countries with at least 100 respondents

  • Chile leads the pack if all the countries are considered otherwise if only the countries with more than 100 respondents are selected, South Africa followed by Nigeria are leading the pack of a healthy They to Not They ratio.

Media Sources

  • The above plot makes it very clear that Podcasts like Partiall Derivative, Data Skpetic and Linear Digression are places where Kagglers who strongly believe that ML Bias and Fairness get their news from.

  • Nate Silver’s 538 makes an entry in the top 5 media sources arranged by T_NT_Ratio KPI

  • Kaggle Forums seemed to need to make more work in terms of encouraging or initiating discussions about ML Bias and Fairness

Work Industry

  • Kagglers in Industries like Non_Profit/Service and Government/Public Service have been better perception about the importance of Model Fairness and Bias.

  • The above plot once again empahsis the importance of Model Bias and Fairness education among students.
  • It’s also unhealthy to see places like Military and Internet-based Services falling behind as those are the places where the model evaluation is crucial and can have serious outcomes.

Data Scientist?

  • Those who consider themselves to be Definitely a data scientist are more likely also to believe about Model Bias and Fairness with those who do not consider themselves a data scientists making the last end of the spectrum with lowest They vs Not They Ratio!

Type of Data

  • Kagglers who use Genetic Data lead the table of giving importance to Model Bias
  • Kagglers who use Image Data and Video Data are least bothered

MOOC - Online course Platform

  • Ironically, Kaggle Learn combining with Coursera and Udemy top the places from where people who don’t think Model Bias is very important learn Data Science / Machine Learning
  • Kagglers who learn DS from Google Developers, Theschool.ai and Fast.ai are the ones who lead the thought process of thinking that Model Bias is very important

Percentage of Data Projects

  • As you can see above, There is a very very minimal demand in the workplace to explore unfair ML bias.
  • In fact, only ~ 1.2K respondents of the total ~13K respondents who respondended to this question, have answered to have a need of more than 50% to explore unfair bias in their data projects. (that’s less than 10%)

Metrics for Model Success

  • While it’s obvious still a fact that Most people would say that usual Model Accuracy metrics are considered to determine whether a model is successful or not
  • Revenue / Business Goals has almost 2X more votes than Metri s that consider unfair Bias

Difficulty in implementing a Fair / Bias-free ML Model

  • Collecting relevant Data without Bias is one of the most prominent difficulty in building a Bias free and Fair Model
  • That is followed by, Picking the right Evaluation Metric for the model’s ML Bias and Identifying the group that’s been unfairly targetted in the data so that can be used to unbias / nullify the effect

Model Interpretability / Explaining ML Models

Who are they?

They refer to those beings who think Being able to explain ML model outputs and/or predictions are Very Important in Machine Learning and Not They are those who think otherwise including - Somewhat Important, Not at all important and similar.

Gender

Age

  • The cohort between Age group 18 - 29 are almost half of the index value of the cohort in 60-69 which shows how age and experience play a vital role in their perception about Interpretable Machine Learning

R vs Python vs more

  • Unexpectedly, SAS/STATA tops the index of They-Not_they ratio followed by R and MATLAB - making these three language users on Kaggle believe Model Intepretability is very important.

  • Python users need to be made aware of IML as Python still catches up behind SQL and Julia.

Degree - Education

  • Kagglers with Humanities background strongly believe in IML (Interpretable Machine Learning) with a TNT index of 2.28, followed by Engineerg (Non-CS) and Mathematics/Statistitcs
  • Ironically, Computer Science Engineers are no where near the top with the index of 1.1764

Work Industry

  • Kagglers from Insurance and Military/Defense industry background lead the table
  • As we’ve seen before, Students are no where close to perceive Interpretable Machine Learning is Very important
  • Industries like Marketing, Online Services are worse than Students in their perception

MOOC - Online course Platform

  • Kaggle Learn that has an exclusive course on Interpretable Machine Learning hasn’t managed to be on the top, only with the 5 position
  • Datacamp Kagglers have a strong feeling of Very important about IML
  • Unlike, Model Bias perception where Fast.ai was on the top, here it’s at the rock-bottom

Media Sources

  • Kdnuggets, O’reily data newsletter and Kaggle Forum top the list based on the index T_NT_Ratio making them one of the top media sources They get news from the most

Exploring model insights? - Percentage of Data Projects

ML models to be black boxes - Difficult to explain

The above plot tells us that Most people are confident that they can understand and explain the outputs of Many ML models but not all ML models. In fact, those who feel - Most ML models are Black Boxes are more than those who don’t have any opininon on this matter.

Circumstances where exploring model insights and interpreting model happens

* The above plot indicates that this attempt to explore model insights and interpret model happens when The model has been built specificially for that

  • That’s followed by When first exploring a new ML model or Dataset which tells us an important point that Model Intepretability should be part of AMC or Model Refresh cycle rather than being the first-time duty.

Preferred Methods for Model Interpretability

## Joining, by = "word"

  • Plotting Predicted vs Actual Results remain the standalone topper in the list of preferred methods in Interpretable Machine Learning which is kind of an irony because of the fact that it’s an evaluation metric for Model succses rather than Interpreting Machine Learning Model
  • IML technqiues like LIME, SHAP, ELI5 are way beyond in the list preferences, thus making a point of how much awareness the community has to be edicated to in terms of Interpretable Machine Learning Techniques

Reproducibility

Tools and Methods for ML Reproduction

  • Documentation and Human-readable codes lead the list of preferred tools and methods to make ML reproducible
  • Sophisticated ways like Docker or Virtual Box are lot less preferred even ~ 50% lesser than those above easy options
  • Sharing Code on Github and also code along with Data is another moderately preferred method, that’s less expensive too
  • Sharing code on Hosted Environment like Kaggle Kernels are in better position than Sophisticated techniques like Docker yet fall a lot behind than easy uploading on Github

Barriers preventing from making easier to reuse and reproducible work

  • Producing Reproducible Machine Learning works is considered to be Too Time-consuming which is the most prominent reason why Kagglers don’t prefer doing so.
  • Another important standing out point followed by the above one is, Not having enough incentive of doing so - reproducible Machine Learning.

Recommendations from these Insights

Media Coverage about Bias / IML

Undoing Bias / Improving IML - Papers and Attempts